Skip to content

compiler: Guard cond_mapper lookup in halo hoisting (fix nested HaloSpot KeyError)#2944

Open
ggorman wants to merge 1 commit into
mainfrom
fix-halospot-nested-cond-mapper
Open

compiler: Guard cond_mapper lookup in halo hoisting (fix nested HaloSpot KeyError)#2944
ggorman wants to merge 1 commit into
mainfrom
fix-halospot-nested-cond-mapper

Conversation

@ggorman
Copy link
Copy Markdown
Contributor

@ggorman ggorman commented Jun 4, 2026

Fixes #2943.

_hoist_redundant_from_conditionals (devito/passes/iet/mpi.py) iterated every halo_spot from _filter_iter_mapper and did an unguarded cond_mapper[hs0], raising KeyError for a nested HaloSpot (one whose subtree contains another HaloSpot) — _make_cond_mapper does not register nested HaloSpots as keys. This switches to the defensive cond_mapper.get(hs0) already used in _merge_halospots; the existing if not conditions: continue handles the absent-key case, so behaviour is preserved for present keys.

Minimal reproducer + before/after output are in #2943. The reproducer also verifies this exact fix by monkeypatching _make_cond_mapper to return a defaultdict(list). Confirmed on current main.

Happy to add a regression test under tests/ if you'd like — let me know the preferred home (it needs a nested-HaloSpot SubDomain layout).

_hoist_redundant_from_conditionals iterated every halo_spot from
_filter_iter_mapper and did an unguarded cond_mapper[hs0], which raised KeyError
for a nested HaloSpot (one whose subtree contains another HaloSpot), since
_make_cond_mapper does not register nested HaloSpots as keys. Use the defensive
cond_mapper.get(hs0) (as already done in _merge_halospots); the existing
'if not conditions: continue' handles the absent-key case. Fixes #2943.
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 4, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.44%. Comparing base (870a160) to head (f35e59d).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2944   +/-   ##
=======================================
  Coverage   83.44%   83.44%           
=======================================
  Files         249      249           
  Lines       52203    52203           
  Branches     4496     4496           
=======================================
+ Hits        43560    43562    +2     
+ Misses       7883     7882    -1     
+ Partials      760      759    -1     
Flag Coverage Δ
pytest-gpu-aomp-amdgpuX 68.58% <100.00%> (-0.02%) ⬇️
pytest-gpu-gcc- 78.17% <100.00%> (ø)
pytest-gpu-icx- 78.11% <100.00%> (+0.97%) ⬆️
pytest-gpu-nvc-nvidiaX 69.13% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@FabioLuporini FabioLuporini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raising KeyError for a nested HaloSpot (one whose subtree contains another HaloSpot)

why would there be nested HaloSpots -- that's the true culprit AI is failing to understand here

we're going to need a 10 lines reproducer

@ggorman
Copy link
Copy Markdown
Contributor Author

ggorman commented Jun 4, 2026

Thanks @FabioLuporini — fair question why a nested HaloSpot exists at all, so let me start there, then give you the short reproducer, then come back to whether .get() is the right place for the fix.

Where this came from

We hit this while implementing the staggered summation-by-parts (SBP-SAT)
finite-difference scheme of Gao & Keyes 2020 (doi:10.1190/geo2019-0566.1,
acoustic–elastic coupling) and the covariant acoustic SBP scheme of O'Reilly &
Petersson 2020 (doi:10.1016/j.jcp.2020.109386). Those schemes tile the domain
into many SubDomain-restricted zones (interior + per-boundary closure rows) and
emit per-zone Eqs that read neighbouring fields at offsets that cross the zone
boundaries. That layout — many overlapping, partially-nested halo families for the
same Function — is what produces the nested HaloSpot. The reproducer below strips
all the SBP machinery away and keeps only that essential shape.

Why there is a nested HaloSpot

I traced the IET. It is genuinely nested, and I now think the nesting itself is a
legitimate structure that the pass simply doesn't handle, rather than a
mis-construction — but I'd value your read on the last point.

At the IET level, the offending Operator's tree contains an outer HaloSpot
covering (A, B) whose body contains two inner HaloSpot(A) nodes. Dumping it
just before _hoist_redundant_from_conditionals runs:

total HaloSpots in IET      : 7
nested HaloSpots (parent)   : 1
  OUTER HaloSpot(A, B) contains INNER HaloSpot(A), HaloSpot(A)
loop-iterated halo_spots NOT in cond_mapper (-> KeyError) : 2  [HaloSpot(A), HaloSpot(A)]

The crash is an asymmetry between the two mappers the pass builds from the same
IET
(devito/passes/iet/mpi.py, current main):

  • cond_mapper = _make_cond_mapper(iet) (mpi.py:463) is built from
    MapHaloSpots().visit(iet). MapHaloSpots is visit_HaloSpot = MapKind.visit_dummy (devito/ir/iet/visitors.py:989-990), and visit_dummy
    records the node but does not recurse into the HaloSpot's children
    (visitors.py:965-969). So a HaloSpot that lives inside another HaloSpot's
    body
    is never visited → it is not a key in cond_mapper.

  • iter_mapper = _filter_iter_mapper(iet) (mpi.py:449) is built from
    MapNodes(Iteration, HaloSpot, 'immediate'). MapNodes.visit_Node
    (visitors.py:1041-1066) records a HaloSpot under its enclosing Iteration and
    then, because a HaloSpot is not the parent_type, continues recursing into
    the HaloSpot's body
    (the else branch, visitors.py:1062-1064). So it does
    collect the inner, nested HaloSpots and attaches them to the same Iteration.

_hoist_redundant_from_conditionals then iterates iter_mapper's halo_spots
(which include the nested ones) and does an unguarded subscript
(mpi.py:91):

for hs0 in halo_spots:
    conditions = cond_mapper[hs0]   # KeyError: nested HaloSpot is not a key

The nested HaloSpot(A) is in halo_spots but not in cond_mapper, so the
subscript raises. Notably the same map is read defensively a few lines down
in _check_control_flowcond_mapper.get(hs0) / .get(hs1) (mpi.py:485-486)
— so the rest of the module already assumes a halo_spot may be absent; line 91 is
the one spot that didn't get the guard.

So to answer "why would there be nested HaloSpots": they arise whenever several
overlapping SubDomain-restricted halo families for the same Function get
scheduled such that one HaloSpot's subtree ends up containing another. The two
mappers disagree on whether nested HaloSpots are "real" — MapNodes says yes,
MapHaloSpots says no — and the pass trusts the first while indexing the second.

It reproduces on 4.8.21 and on current main
(4.8.22.dev221+gcf29367ce, commit cf29367ce).

Minimal reproducer (~16 lines)

Two accumulator families (a1, a2) over a 3×3 grid of SubDomains, each reading
a second Function B at an offset but in a different subset of rows → the
overlapping B halo families nest. Default opt crashes; opt='noop' is clean.

from devito import Grid, Function, Eq, Operator, SubDomain

n, nr = 10, 2
def mk(nm, tx, ty):
    return type(nm, (SubDomain,), {"name": nm,
        "define": lambda self, d: {d[0]: tx, d[1]: ty}})()
th = [("middle", i, n - 1 - i) for i in range(nr)] + [("middle", nr, nr)]
subs = [mk(f"z{i}_{j}", th[i], th[j]) for i in range(nr + 1) for j in range(nr + 1)]
grid = Grid(shape=(n, n), subdomains=tuple(subs))
x, y = grid.dimensions
A, B = Function(name="A", grid=grid), Function(name="B", grid=grid)
a1, a2 = Function(name="a1", grid=grid), Function(name="a2", grid=grid)
eqs = [Eq(a1, A[x + 1, y] + (B[x + 2, y] if i >= 1 else 0), subdomain=subs[i * (nr + 1) + j])
       for i in range(nr + 1) for j in range(nr + 1)]
eqs += [Eq(a2, A[x - 1, y] + (B[x + 3, y] if i >= 2 else 0), subdomain=subs[i * (nr + 1) + j])
        for i in range(nr + 1) for j in range(nr + 1)]
Operator(eqs)()  # default opt -> KeyError; Operator(eqs, opt='noop') works

Traceback (on main, MPI off, default opt):

File ".../devito/operator/operator.py", line 508, in _lower_iet
    graph = cls._specialize_iet(graph, **kwargs)
File ".../devito/core/cpu.py", line 210, in _specialize_iet
    mpiize(graph, **kwargs)
File ".../devito/passes/iet/mpi.py", line 398, in mpiize
    optimize_halospots(graph, **kwargs)
File ".../devito/passes/iet/mpi.py", line 28, in optimize_halospots
    iet = _hoist_redundant_from_conditionals(iet)
File ".../devito/passes/iet/mpi.py", line 91, in _hoist_redundant_from_conditionals
    conditions = cond_mapper[hs0]
                 ~~~~~~~~~~~^^^^^
KeyError: <HaloSpot(A)>

I've reduced from 9 to 4 SubDomains and to a single accumulator family, but it
stops nesting (and stops crashing) at those reductions — the nesting needs the
overlapping multi-zone × multi-family structure, so ~16 lines is about as small as
I could get it while still triggering the real path. Happy to trim further if
there's a shape you'd prefer.

Is .get() the right fix, or should it move upstream?

Honest read: the one-line .get() makes line 91 consistent with the rest of the
module
(_check_control_flow already uses .get() on the same map, and the
existing if not conditions: continue already handles the None/empty case), so
it's a safe, behaviour-preserving crash-guard. But I don't want to oversell it as
the fix, because — as you suspected — it papers over the deeper inconsistency:
_make_cond_mapper/MapHaloSpots and _filter_iter_mapper/MapNodes disagree
about whether nested HaloSpots exist
, and the pass then walks the union.

So there are really two questions, and I'd like your call:

  1. Should nested HaloSpots be allowed in the IET at this stage at all? If the
    intended invariant is that optimize_halospots sees a flat list of HaloSpots
    per Iteration (no nesting), then the true fix is upstream — either flatten/merge
    the nested HaloSpots before this pass, or fix the scheduling that produces the
    nesting — and .get() is only masking it. If you point me at where that
    invariant should hold, I'm happy to move the fix there.

  2. If nesting is legitimate, then the consistent fix is to make
    _make_cond_mapper recurse into HaloSpot bodies (so nested HaloSpots get keys
    too), rather than just guarding the one lookup — that closes the asymmetry at
    the source rather than at one of its consumers. I can do that instead of (or in
    addition to) the .get().

From the outside the nesting looks like a genuine consequence of overlapping
SubDomain halo families rather than a corrupt tree, which would point at (2) — but
I want to be upfront that I haven't verified that making _make_cond_mapper recurse
is safe for the legitimately-flat cases, so I'm not putting it forward as a ready
patch, just an observation. You know the halo-pass invariants far better than I do,
so I'd rather you tell me which invariant is intended; I'm happy to prototype either
(the upstream flatten/scheduling fix, or the recursing _make_cond_mapper) and
verify it against the reproducer before we settle the PR. Either way I'm glad to add
a regression test under tests/ once we agree on the shape — just point me at the
preferred home for a nested-HaloSpot SubDomain fixture.

In the meantime we're carrying opt='noop' locally, which sidesteps the pass
entirely and leaves the numerics unchanged.

@FabioLuporini
Copy link
Copy Markdown
Contributor

you crazy if you think I read that entire thing 😂

@FabioLuporini
Copy link
Copy Markdown
Contributor

in that MFE, what does the clusters list look like at this point?

https://github.com/devitocodes/devito/blob/main/devito/ir/clusters/algorithms.py#L53

@ggorman
Copy link
Copy Markdown
Contributor Author

ggorman commented Jun 4, 2026

At algorithms.py:53 (going into communications) the MFE has 18 clusters: 9 writing a1, 9 writing a2 — one per SubDomain — all with identical intervals [ix[0,0], iy[0,0]], no guards, and not fused:

[0..8]   writes=a1[ix+1, iy+1]   intervals=[(ix,0,0),(iy,0,0)]   guards=-
[9..17]  writes=a2[ix+1, iy+1]   intervals=[(ix,0,0),(iy,0,0)]   guards=-

communications then expands the list to 36 (the HaloTouch clusters). (devito 4.8.22.dev46+g7aae8d96f, MPI off.)

So it's 18 same-iteration-space, per-SubDomain clusters over the shared Functions A/B — the nested HaloSpot downstream comes out of that. Happy to dump anything else at that point (e.g. the halo schemes, or the IET just before optimize_halospots)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

compiler: KeyError in MPI halo hoisting on nested HaloSpot (unguarded cond_mapper lookup)

3 participants